Skip to content

include bin counts and edges in tablereport json output#2164

Open
jeromedockes wants to merge 6 commits into
skrub-data:mainfrom
jeromedockes:hist-in-report-json
Open

include bin counts and edges in tablereport json output#2164
jeromedockes wants to merge 6 commits into
skrub-data:mainfrom
jeromedockes:hist-in-report-json

Conversation

@jeromedockes

Copy link
Copy Markdown
Member

Now the json output of TableReport also contains the result of np.histogram on the part of the column used for the histogram plots (non-null + remove outliers)

it is the only piece of information needed to reconstruct everything that is displayed by the table report from the json output only

@jeromedockes jeromedockes added enhancement New feature or request TableReport anything related to the TableReport labels Jun 15, 2026
@jeromedockes jeromedockes marked this pull request as ready for review June 15, 2026 12:12
@jeromedockes jeromedockes added this to the Release 0.10 milestone Jun 15, 2026
@jeromedockes

Copy link
Copy Markdown
Member Author

tentatively adding to the 0.10 milestone because it is fairly simple and would be useful for skore but we can remove it if it ends up being a blocker

@rcap107 rcap107 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, though I'd like a couple more comments to make some code clearer.

Could you also extend the docs of the json() function to mention that it includes both the svg and the bins for the histograms? I think it's useful information because users may not expect that, or may need the data and not know where to get it from.

Comment thread skrub/_reporting/_plotting.py Outdated
col = sbd.to_float32(col)
values = sbd.to_numpy(col)
if sbd.is_any_date(col):
# numpy histogram does not handle datetimes but matplotlib does

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining that this is converting dates to seconds since epoch? It's not clear from the code

summary["value_is_constant"] = False
summary["quantiles"] = quantiles
if not with_plots:
summary["histogram_data"] = _plotting.histogram_data(column)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a small comment here to explain what this is used for?

@jeromedockes

Copy link
Copy Markdown
Member Author

that it includes both the svg and the bins for the histograms

that seems very specific given that currently none of the content is described in the docs, and the way to use it is to create one and inspect the contents. we should describe it in documentation (and make it clear it is subject to change without notice), but I think it can be tackled in another pr that describes all the keys not just the histogram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request TableReport anything related to the TableReport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants